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Rearranging Germ-Line DNA Segments 
to Form Antibody Genes 



For several decades an enormous number (per- 
haps in the millions) of different antibody (im- 
munoglobulin) molecules have been known to 
exist, each characterized by a unique site that can 
bind to specific molecular determinants (anti- 
gens). Many immunologists thought initially that 
all antibodies were made of the same polypeptide 
chains and that their uniqueness arose from the 
way their newly synthesized identical polypeptide 
chains folded around the respective antigens. This 
theory was proved wrong. Each antibody has its 
own amino acid sequence, and each antibody-pro- 
ducing cell (plasma cell) makes only one antibody. 
At first, this was a disturbing discovery because it 
seemed to imply that a separate gene would have 
to exist for each separate antibody. If so, perhaps 
a large fraction, if not the majority, of the verte- 
brate DNAs would have to be devoted to coding 
antibody molecules. But such speculations could 
not be tested until protein chemists established the 
basic. structure of the antibody molecule. 

The Basic Structure of Antibody 
Molecules Is Established 

The first insights began to emerge in the early 
1960s, when it was realized that the fundamental 
antibody unit consists of two identical light (L) 
chains of molecular weight 17,000 and two identi- 
cal heavy (H) chains of molecular weight 35,000, 
held Together by disulfide bonds (Figure 9-1). 
(The terms "light" and "heavy" refer to the differ- 
ences in the molecular weight of the chains.) Each 
such four-chain unit contains two identical binding 
sites for antigens, with a site being formed partly 
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Figure 9-1 

The structure of an antibody protein. Two 
light chains (color) and two heavy chains 
(white) are held together by disulfide bonds. 
The light chains and the heavy chains each 
contain one variable unit (V L or V H ) at their 
amino-terminal ends. The light chains also 
contain one constant unit (C L ); the heavy-chain 
constant portion has four domains (C H 1, C H 2, 
C H 3, and the hinge region). 



by specific amino acids of the light chain and partly 
by specific heavy-chain amino acids. Once the basic 
antibody layout had been established, the amino 
acid sequences of the component light and heavy 
chains were determined by using the homoge- 
neous antibodies made by specific myeloma cells. 
Myelomas are cancerous antibody-producing 
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(plasma) cells, and in any one animal all the cells 
of a myeloma tumor are the descendants of one 
original cancer cell. This explains why all the anti- 
body molecules from any one myeloma have the 
same amino acid sequence. 

Both light- and heavy-chain sequences vary 
from one type of antibody to another, but in a way 
that no one would have predicted initially. Al- 
though each chain has unique sequences, almost 
all of this specificity is restricted to about 100 
amino acids at the amino-terminai ends (the vari- 
able, or V, regions). Half of each light chain and 
three-quarters of each heavy chain have almost 
identical sequences (the constant, or C, regions) 
(Figure 9-1 , page 117). 

Separate Genes for V and C Segments 
Are Proposed 

To account for the constant and variable portions of 
the chains, William Dreyer and Claude Bennett at 
the California Institute of Technology put forward 
a bold hypothesis in 1965. They proposed that the 
V and C regions are coded by separate genes, and 
that in the germ line the C-region segments (C L for 
the light chain, C H for the heavy chain) are each 
coded for by only one gene, but that the V regions 
(V H and V L ) are coded for by many thousands of 
genes. Dreyer and Bennett further proposed that 
a functional antibody is formed when genetic 
recombination in the precursor to the plasma cell 
brings one of the V genes next to its respective C 
gene to yield V L C L and V H C H genes. This percep- 
tive hypothesis won few early converts because it 
flew in the face of the general belief that the ar- 
rangement of DNA within a given chromosome 
was effectively immutable, except at meiosis dur- 
ing the formation of the sex cells. 

Messenger RNA Probes Are Used 
to Obtain Support for the Joining 
of V and C Genes 

It was impossible to test this V-C joining hypothe- 
sis until there were direct molecular probes that 
might identify the putative V and C genes. In the 
first experiments, which were done between 1974 
and 1976, mRNA probes that were more than 50 
percent pure were made from myeloma cells. 
These mRNA probes were mixed, under condi- 
tions favoring hybridization, with unfractionated 
homologous myeloma DNA to see if it was possi- 



ble to count the number of genes for the single 
antibody type that was present in a given myeloma 
cell. The answer for at least one light-chain mRNA 
probe was that there were a very low number of 
genes, and perhaps only one. 

Such experiments, however, could not distin- 
guish V from C sequences, nor could they indicate 
differences in the relative locations of V and C 
sequences in embryonic cells compared with 
myeloma cells. To do that required a way to cut up 
the total myeloma and embryonic cell DNAs into 
reproducible pieces, a procedure that became pos- 
sible only with the ready availability of restriction 
enzymes. Using them, Susumu Tonegawa at the 
Basel Institute of Immunology in Switzerland ob- 
served in the spring of 1976 that V and C se- 
quences that were linked together on the same 
DNA restriction fragment from an antibody-pro- 
ducing mouse myeloma cell were not similarly 
linked together in embryonic DNA (Figure 9-2). 
This classic experiment was done with necessarily 
impure mRNA probes because of the still-effective 
prohibitions against cloning cDNA molecules. 
But as soon as the first suitable cDNA vector was 
approved, the appropriate cDNA probes for V 
and C regions were made and their specificity di- 
rectly determined by DNA sequencing. These re- 
gions were then used for selecting genomic seg- 
ments that carried the respective antibody genes. 
Possession of such reagents has revolutionized our 
understanding of the molecular basis of antibody 
diversity. 
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WIDELY SEPARATED IN EMBRYONIC DNA 



V GENES 



C GENE 



RECOMBINATION BRINGS V GENE 
CLOSE TO C GENE 



V C 
FUNCTIONAL ANTIBODY GENE 
OF MYELOMA PLASMA CELL 

Figure 9-2 

Functional antibody genes are produced by 
genetic recombination. The V and C genes in 
embryonic myeloma DNA were discovered to 
be widely separated, whereas they were found 
close together in mature myeloma 
antibody-producing cells. 
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Figure 9-3 

The protein domains of immunoglobulin heavy chains are separated by introns. 



Functional Antibody Genes Are Isolated 
from Myeloma Cells 

The cDNA probes that established the nature and 
number of antibody genes were made from spe- 
cific myelomas whose antibody products had al- 
ready been sequenced. Direct comparisons were 
thus possible between the nucleotide sequences of 
functional antibody genes and the amino acid se- 
quences of the antibodies they specified. Many in- 
trons were found immediately, and, most impor- 
tantly, most were located at junctions between 
functional domains. In the light chain, an intron 
separates almost all of the amino-terminal leader 
sequences from the V segment, and a second in- 
tron divides the V from the C sequences. Within 
heavy-chain genes, introns separate functionally 
related domains (in other words, they separate 
exons coding for domains) more extensively. Each 
of the three domains of the C H protein is clearly 
delineated by introns, as is the so-called hinge re- 
gion lying between the first arid second C H do- 
mains (Figure 9-3). All of these observations have 
supported the hypothesis that proteins have 
evolved by the rearrangement of exons. 

Embryonic Cells Are Sources of Unjoined 
V and C Genes 

The structures of the V and C segments before 
they are joined to create functional antibody genes 
were revealed by cloning the appropriate genomic 
DNA segments from embryonic cells and hybri- 
dizing them with probes specific for the V and C 
regions. C-Region probes invariably were very 



specific, whereas V-region probes often hybri- 
dized to many different V genes. Such cross- 
hybridization reflects the fact that the V regions of 
different antibodies often differ by only a few 
amino acid substitutions. Now we have evidence 
for at least 200 V L genes and an equal number of 
V H genes. Note that because the specificity of an 
antibody is determined by both its V L and V H com- 
ponents, the number of potentially different an- 
tibodies is at least V, X V H , or 200 X 200 = 
40,000. 

In contrast to the multiplicity of V L and V H 
genes, there exist only two C L genes (C x and CJ, 
located on different chromosomes, and a single 
cluster of linked C H genes. In the mouse, this C H 
cluster is located on chromosome 12 and consists 
of eight genes that are spaced out over some 200 
kilobases (kb) of DNA. The separate C H genes 
reflect different functional roles for their gene 
products, with, for example the C M gene coding 
for the early-appearing immunoglobulin M, and 
the several C 7 genes coding for immunoglobulins 
that have generally higher specificity and that 
predominate in the later stages of an immune re- 
sponse. 

Multiple J (Joining) Segments Are 
Attached to Genomic C (Constant) 
Segments 

The number of potential light chains is greatly 
increased by the presence of a cluster of related, 
but not identical, J (for joining) segments that 
reside upstream of each C L gene. The number of 
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potential heavy chains is likewise increased by a 
J-segment cluster located upstream from each C H 
cluster. The linkage of a V segment to a C segment 
may occur next to any of these J segments, and ; 
depending on which J segment is used, a different 
group of amino acids will be found inserted be- 
tween rhe amino acids encoded by the V segment 
and those encoded by rhe C segment. RNA splic- 
ing thus occurs in such a way that only the J seg- 
ment used in the V-C joining event is retained 
(Figure 9-4). How splicing can be so regulated 
remains totally mysterious, as does the nature of 
the events that join the V and C genes together. 

The joining event itself is slightly variable, 
which creates additional diversity at the V-JC 
combining site. The site occurs, probably not by 
chance, within the nucleotides coding for amino 
ac:ids that help form the cavity into which antigens 
bind. One.model for DNA joining postulates that 
rhe respective V and C segments, initially located 
far apart on the same DNA molecule, are brought 
together by a recombination process that elimi- 
nates the intervening sequences. The sequences at 
the ends of the V and JC segments are comple- 
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Figure 9-4 

A V H gene is linked to a C H gene by means 
of a J (joining) segment that is located in a 
cluster of such segments upstream of the C 
genes. After the initial recombination event, 
RNA splicing removes all of the other T 
segments to produce the mature mRNA. 



mentary and could allow formation of hydrogen- 
bonded hairpin loops that would align the appro- 
priate bases ready for cutting and rejoining. There 
is some experimental support for this proposal. 

Three Discontinuous Regions of DNA 
Code for Heavy-Chain Amino Acids 

At first it was believed that heavy-chain formation 
followed the same pattern of events as light-chain 
formation. But when the appropriate germ-line 
V H and J H C H regions were cloned, sequenced, and 
compared to the sequences found in the respective 
myeloma heavy chains, it became apparent that a 
second group of internal amino acids were not 
coded by either the V H or J H C H segments but had 
to arise from a third DNA segment. This third 
segment, given the name D (for diversity), is sited 
between the V H and J H C H segments, h consists of 
a tandem multigene family of related sequences. 
The existence of multiple D segments, any one of 
which has the possibility of being inserted into the 
functional heavy-chain gene, still further increases 
the potential number of heavy-chain specificities 
(V H X D H X J H ) (Figure 9-5). Like thej-encoded 
amino acids, the D-encoded amino acids help to 
form the antigen-binding sites, so the diversity 
they create is likely to be biologically useful. 

A DNA Elimination Event Allows 
a V H Gene to Be Attached to 
Two Different C H Genes 

Recombinant DNA procedures have also swiftly 
solved the puzzle of how a given heavy-chain vari- 
able gene (V H ) can be attached first to a constant 
segment (C p ) that is characteristic of the immuno- 
globulin M class, and then, during the later im- 
mune response, can transfer its linkage to a con- 
stant segment (C y ) characteristic of the immuno- 
globulin G class. No change of immunological 
specificity occurs during this transfer, because the 
heavy-chain types differ only in their "constant" 
components, all of which are coded by a group of 
genes clustered together on the same chromo- 
some. How this switch happens became crystal- 
clear as soon as the appropriate C^ and 
C y genes were cloned and their sequences were 
compared with those of a functional gene coding 
for a known y-class heavy chain (MOPC 141). 
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The key observation was the finding of J-segment 
bases in the gene coding for the heavy chain of 
MOPC 141, despite the absence of any J segment 
near the corresponding embryonic C y gene. In 
contrast, several J segments were found at the 
beginning of the embryonic C H gene, one of which 
exactly corresponded to that observed in the func- 
tional C 7 gene of MOPC 141. Two recombination 
events are therefore necessary to generate a func- 
tional gene for y-class heavy chains. The first join- 
ing event attaches a V H gene to the C M gene at one 
of its flanking J segments, thereby allowing syn- 
thesis of a jUL-class heavy chain. This synthesis con- 
tinues until a second recombination event 
removes most of the C H gene sequences and links 
the previously joined V H J H segmenr to the intron 
sequences flanking a C 7 gene. This leads to the 
synthesis of a y-class heavy chain whose J-coded 
sequences bear witness to the prior V H J H C H ar- 
rangement (Figure 9-6). 

Alternative Splicing Allows Single Cells 
to Make Both )ul- and 6-Class Heavy 
Chains with Identical V H Segments 

Recombinant DNA procedures have also clarified 
another previously puzzling observation — namely, 
that one cell can simultaneously make heavy chains 
that are of rwo different types but that contain the 
same variable region. The initial explanation pro- 



posed for this situation was that one type of heavy 
chain might be translated from a very stable long- 
lived mRNA that persisted in the cytoplasm long 
after the corresponding gene had been eliminated. 
Analysis of the structure of the genes involved 
with recombinant DNA techniques, however, 
suggested an alternative. The variable region (V), 
a joining region (J), and the two constant regions 
C and C 6 were found to be contiguous. The 
whole complex might thus be translated into a 
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Figure 9-6 

Genetic recombination changes a functional 
u-class gene to a functional y-ciass gene. 
This recombination event occurs after the 
formation of the initial V H -J H link. 
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large precursor RNA molecule from which by 
differential splicing, either one or the other of the 
constant regions is eliminated (Figure 9-7). This 
mechanism was confirmed with cDNA probes 
prepared from the appropriate mRNA. Differen- 
tial splicing of a common precursor RNA gener- 
ates two distinct heavy-chain mRNAs. A dual- 
splicing potential also determines whether certain 
classes of antibodies are bound to the plasma 
membranes of the cells in which they are made or 
are secreted to the outside. 

Somatic Mutations Provide a Further 
Source of Immunoglobulin Diversity 

Before the existence of recombinant DNA there 
was seemingly endless debate as to whether the 
specificity for antibody genes lay largely in germ- 
fine DNA segments or whether most diversity was 
created by somatic mutations that occurred during 
the multiplication of antibody-producing cells and 
their precursors. With the discovery of V and 



(D)JC joining it became unambiguously clear that 
much antibody diversity was carried ,'n germ-line 
segments. Virtually simultaneously how 
ever, » was found that the exact ammo acid se- 
quence of many antibodies does not precisely cor 
respond w,th that predicted by their respective 
germ-hne sequences. The first direct evidence for 
somatic diversification came from analysis of the 
mouse light-chain variable regions. Of 19 X pro 
reins examined, 12 had the germ-line sequence 
wh.le ? others had one to three amino acid differ- 
ences. Since then, somatic diversification has also 
been found in V H segments, and significant pro- 
pornons of antibody variability must now be as- 
cribed to somatic mutational events. Although 
these mutanons occur both in the DNA sequer.- 
that control the specificity of antigen binding . 
\ regjons) and in those that code for the so-caii*. 

framework - (C) regions, most mutations are ob- 
served to affect antigen binding. Equally impor- 
tant mutant" antibodies are largely restricted to 
the later stages of the immunological response in 
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r which C 7 - (or C e -) chain synthesis predominates. It 
- has thus been asked whether the antibodies pro- 
: - duced by somatic mutations are more efficient 
(whether they bind antigen better) than the early- 
appearing antibodies whose specificity comes en- 
tirely from germ-line sequences. Though im- 
munological theory predicts that this should 
occur, preliminary experiments have failed to sup- 
port this idea. 

Establishing the Genes of the Major 
Histocompatibility Complex (MHC) 
Proteins and Their Protein Antibodies 
Through Gene Cloning 

Some 30 years ago, the rejection of foreign skin 
grafts (transplantations) in mice was found to de- 
pend on a group of proteins (H2) that recognized 
the skin cells as "nonself." It was determined that 
these proteins were coded by closely linked genes 
that were mapped to a region of chromosome 17. 
With time, the number of these so-called "his- 
tocompatibility proteins" was found to be far 
greater than was first perceived, and the "major 
histocompatibility complex" (MHC) is now 
known to code for three very different classes of 
proteins. Class I consists of the genes for the highly 
polymorphic H2 transplantation antigens (in hu- 
mans, the HLA antigens), which are expressed on 
virtually all cell surfaces, as well as the genes for 
other surface molecules restricted to specific types 
of differentiated cells (for example, blood-forming 
cells). The class II (immune-response) genes en- 
code a specific group of cell surface molecules that 
are present only on lymphocytes and that control 
the extent of specific immunological responses. 
Several components of the complement system 
that links an immune response to the desired de- 
struction of an unwanted foreign cell by lysis con- 
stitute the class III genes. 

Our understanding of the exact chemistry of 
these MHC proteins has lagged far behind our 
understanding of the chemistry of the immuno- 
globulins themselves, and only with the advent of 
cloning procedures is a comprehensive picture of 
the MHC structures beginning to emerge. In gen- 
eral, the class I H2 and H2-like proteins contain a 
45,000-dalton transmembrane protein attached 
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Figure 9-8 

Major histocompatibility complex proteins and 
their protein antibodies. Class I and class JJ 
MHC proteins have variable and constant 
regions in their cytoplasmic domains. The 
constant regions show some homology with 
the constant regions of immunoglobulin 
molecules. Class I MHC proteins are 
associated with another protein called /3 2 - 
microglobulin. 



noncovalently to a 1 2,000-dalton protein, /3 2 . 
microglobulin, that is separately encoded on chro- 
mosome 2. Each 45,000-dalton chain contains 
three external domains comprising approximately 
90 amino acids, a transmembrane region of ap- 
proximately 40 amino acids, and a short cytoplas- 
mic region of about 30 amino acids (Figure 9-8). 
This structure reflects the exon-intron arrange- 
ment, with separate exons encoding the signal 
leader peptide, each of the exterior domains, the 
transmembrane region, and the three regions of 
the cytoplasmic component. 

The organization of the class I genes along 
chromosome 17 has been revealed through clon- 
ing 40-kb fragments of mouse DNA into cosmids 
and looking for overlaps between clones contain- 
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The genes coding for MHC class I, II, and III proteins are linked on mc 
chromosome 17. The 36 class I genes have been cloned on cosmids and 
found to occur in 13 clusters containing varying numbers of genes. 



ing class I genes (Figure 9-9)- So far, 36 class I 
genes encompassing some 837 kb of DNA have 
been found. Now the class II and class III genes 
are in the process of being mapped, and through 
chromosome walking the complete molecular map 
of the mouse MHC complex could be obtained 
over the next several years. 

Recombinant DNA analysis has thus already 
profoundly affected immunological research. The 



separate origins of immunological diversity are in 
the process of being sorted out. Equally important, 
by being able to go directly to the MHC genes,' 
which control the extended nature of immunologi- 
cal responses, we are much closer to explaining the 
phenomenological descriptions that have con- 
stituted cell immunology than we would have 
been if we had been limited to the biological ap- 
proaches of the cellular immunologist. 
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